Calibration for Stratified Classification Models

نویسنده

  • Chandler Zuo
چکیده

In classification problems, sampling bias between training data and testing data is critical to the ranking performance of classification scores. Such bias can be both unintentionally introduced by data collection and intentionally introduced by the algorithm, such as under-sampling or weighting techniques applied to imbalanced data. When such sampling bias exists, using the raw classification score to rank observations in the testing data can lead to suboptimal results. In this paper, I investigate the optimal calibration strategy in general settings, and develop a practical solution for one specific sampling bias case, where the sampling bias is introduced by stratified sampling. The optimal solution is developed by analytically solving the problem of optimizing the ROC curve. For practical data, I propose a ranking algorithm for general classification models with stratified data. Numerical experiments demonstrate that the proposed algorithm effectively addresses the stratified sampling bias issue. Interestingly, the proposed method shows its potential applicability in two other machine learning areas: unsupervised learning and model ensembling, which can be future research topics.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linear and Nonlinear Multivariate Classification of Iranian Bottled Mineral Waters According to Their Elemental Content Determined by ICP-OES

The combinations of inductively coupled plasma-optical emission spectrometry (ICP-OES) and three classification algorithms, i.e., partial least squares discriminant analysis (PLS-DA), least squares support vector machine (LS-SVM) and soft independent modeling of class analogies (SIMCA), for discriminating different brands of Iranian bottled mineral waters, were explored. ICP-OES was used for th...

متن کامل

Development of near infrared reflectance spectroscopy (NIRS) calibration model for estimation of oil content in a worldwide safflower germplasm collection

The development of NIRS calibration model as a rapid, precise, robust, and cost-effective method to estimate oil content in ground seeds of worldwide safflower germplasm collection grown under different agro-climatic conditions was the key objective of this research project. The oil content was measured by accelerated solvent extraction method in a total of 328 samples collected across 2004 (16...

متن کامل

Calibration method of numerical models in coastal sediment studies of Kuhmobarak area

Abstract Application of numerical models is useful in coastal sediment studies; however, it is essential to calibrate the models in an appropriate method. Empirical equations, historical satellite imagery and other coastal data and evidence are proposed for the models calibration in the vicinity of coastal indicators around the site of projects. In this study, coastal sediment process of K...

متن کامل

Automatic Calibration of HEC-HMS Model Using Multi-Objective Fuzzy Optimal Models

Estimation of parameters of a hydrologic model is undertaken using a procedure called “calibration” in order to obtain predictions as close as possible to observed values. This study aimed to use the particle swarm optimization (PSO) algorithm for automatic calibration of the HEC-HMS hydrologic model, which includes a library of different event-based models for simulating the rainfall-runoff pr...

متن کامل

On Calibration and Application of Logit-Based Stochastic Traffic Assignment Models

There is a growing recognition that discrete choice models are capable of providing a more realistic picture of route choice behavior. In particular, influential factors other than travel time that are found to affect the choice of route trigger the application of random utility models in the route choice literature. This paper focuses on path-based, logit-type stochastic route choice models, i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.00064  شماره 

صفحات  -

تاریخ انتشار 2017